Skip to content

Nexus error serialization#1323

Draft
VegetarianOrc wants to merge 24 commits intomainfrom
nexus-error-serialization
Draft

Nexus error serialization#1323
VegetarianOrc wants to merge 24 commits intomainfrom
nexus-error-serialization

Conversation

@VegetarianOrc
Copy link
Contributor

@VegetarianOrc VegetarianOrc commented Feb 11, 2026

Note This PR can't be merged until the corresponding Core PR (temporalio/sdk-core#1109) is merged and updated in this branch and the nexus-rpc PR (nexus-rpc/sdk-python#45) is merged and released

What was changed

  • Nexus error serialization now uses Temporal Failures: Instead of the custom Nexus Failure JSON format (nexus.v1.Failure with metadata/details), Nexus handler errors and operation errors are now serialized as standard Temporal failure.v1.Failure protos via the SDK's DataConverter/FailureConverter. This includes HandlerErrornexus_handler_failure_info and OperationErrorCancelledError/ApplicationError conversions.
  • Simplified _nexus.py worker code: Removed the _nexus_error_to_nexus_failure_proto, _operation_error_to_proto, and _handler_error_to_proto helper methods. Errors are now encoded directly through data_converter.encode_failure() into the completion/response proto's failure field.
  • Simplified _exception_to_handler_error: Removed the workaround that inserted an extra ApplicationError at the head of the HandlerError cause chain (previously needed to preserve the HandlerError message when hoisted to the Nexus Failure).
  • Converter improvements: _error_to_failure for HandlerError now uses error.message and error.stack_trace instead of str(error). from_failure uses match/case instead of if/elif chains.
  • Updated protos and sdk-core: Updated nexus_pb2, workflow_activation_pb2, workflow_commands_pb2, and the sdk-core submodule to support the new Nexus failure fields on completion protos.
  • Removed HTTP-based Nexus tests: Deleted test_handler.py and test_handler_async_operation.py which tested via direct HTTP calls (the HTTP interface is not user-facing). Converted remaining tests (test_workflow_run_operation.py, test_dynamic_creation_of_user_handler_classes.py) to use workflow callers.
  • Refactored error chain tests: Replaced the class-based ErrorConversionTestCase registry pattern with typed dataclasses (ExpectedNexusOperationError, ExpectedHandlerError, ExpectedApplicationError, ExpectedCancelledError) and explicit ErrorTestService handler methods.
  • Added failure converter unit tests: New tests in test_converter.py for round-trip serialization of HandlerError, NexusOperationError, OperationError, and related types.
  • Cleaned up stale TODOs: Removed TODO comments marked as "Won't Do" in the Nexus TODO triage, and removed the unused self._interceptors variable in the Nexus worker.

Why?

The previous Nexus error serialization used a custom JSON-based format (nexus.v1.Failure with metadata and JSON details) that was specific to the Nexus HTTP protocol. With the move to have Core SDK handle protocol-level conversion based on server capabilities, the Python SDK now only needs to produce standard Temporal Failure protos.

Checklist

  1. How was this tested:

    • Existing Nexus workflow caller tests converted and expanded to validate error chains end-to-end
    • New unit tests for FailureConverter round-trip serialization of Nexus error types in tests/test_converter.py
    • New typed error chain validation in tests/nexus/test_workflow_caller_error_chains.py
  2. Any docs updates needed?

    • No

VegetarianOrc and others added 15 commits January 9, 2026 13:39
This commit removes TODO comments from the codebase that were triaged and
marked as "Won't Do" in the Nexus Python SDK TODO tracking document. These
comments documented items that were decided not to be implemented or were
already completed.

Removed TODO comments from:
- temporalio/worker/_activity.py (2 comments)
- temporalio/workflow.py (1 comment)
- temporalio/worker/_nexus.py (1 comment)
- temporalio/nexus/_token.py (1 comment)
- tests/nexus/test_workflow_caller.py (3 comments)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive round-trip tests for failure converter handling of:
- nexusrpc.HandlerError (all HandlerErrorType values, retryable_override mapping)
- NexusOperationError (all fields, cause chains)
- nexusrpc.OperationError (one-way from_failure)
- nexus_sdk_failure_error_info (one-way to FailureError)
- ResetWorkflowError (with and without heartbeat details)

Also tests fallback behavior for unknown handler error types and operation
error states.

Remove TODO(nexus-preview) comment as test coverage is now in place.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The HTTP interface is not user-facing. Delete test_handler.py and
test_handler_async_operation.py which tested via direct HTTP calls. Convert
test_workflow_run_operation.py and test_dynamic_creation_of_user_handler_classes.py
to use workflow callers instead. Remove HTTP-specific helper code (ServiceClient,
Failure, dataclass_as_dict) from tests/helpers/nexus.py.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace the class-based ErrorConversionTestCase pattern with a simpler
dataclass-based approach:

- Remove ErrorConversionTestCase base class and registry pattern
- Add typed dataclasses for expected exceptions: ExpectedNexusOperationError,
  ExpectedHandlerError, and ExpectedApplicationError
- Move operation implementations from class methods to explicit ErrorTestService
  handler methods
- Replace tuple-based expected_exception_chain_in_workflow with typed
  expected_exception_chain using the new dataclasses
- Refactor _validate_exception_chain to use isinstance() checks instead of
  string key comparisons on dict[str, Any]
- Update workflow to call operations by name rather than through registry lookup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… no longer apply with the most recent api update
…havior when reset_workflow_failure_info is set on a failure. Remove some stale comments
temporalio.exceptions.FailureError
| nexusrpc.HandlerError
)
match failure.WhichOneof("failure_info"):
Copy link
Contributor Author

@VegetarianOrc VegetarianOrc Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to use match here to help ensure all cases are handled explicitly. This change was more helpful when there were more failure info types added in a previous version of the API. Can switch back to old style if it is preferred.

VegetarianOrc and others added 9 commits February 10, 2026 18:15
…failure when deserializing or serializing HandlerErrors
Add comprehensive tests for RPCError -> HandlerError conversion in Nexus
operations:

- test_rpc_error_fails_without_retry: Tests non-retryable RPCError status
  codes (INVALID_ARGUMENT, ALREADY_EXISTS, FAILED_PRECONDITION, OUT_OF_RANGE,
  NOT_FOUND, UNIMPLEMENTED) and verifies they map to correct HandlerErrorType
  without retry

- test_rpc_error_is_retried: Tests retryable RPCError status codes (ABORTED,
  UNAVAILABLE, CANCELLED, DATA_LOSS, INTERNAL, UNKNOWN, UNAUTHENTICATED,
  PERMISSION_DENIED, RESOURCE_EXHAUSTED, DEADLINE_EXCEEDED, OK) and verifies
  they cause retry behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… Add tests to cover the two failure scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant